An Unsupervised System for Identifying English Inclusions in German Text
نویسنده
چکیده
We present an unsupervised system that exploits linguistic knowledge resources, namely English and German lexical databases and the World Wide Web, to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classification results of our system and compare them to the performance of a trained machine learner in a series of inand crossdomain experiments.
منابع مشابه
Integrating Language Knowledge Resources to Extend the English Inclusion Classifier to a New Language
This paper presents an unsupervised system that classifies English inclusions in written text. It will demonstrate that extending this English inclusion classifier, which was originally designed for German, requires minimal time and effort to adapt to a new language, in this case French. The analysis of several evaluation experiments carried out on French and German data shows that the system p...
متن کاملInvestigating Prosodic Modifications for Polyglot Text-to-Speech Synthesis
This paper investigates the need for applying English prosody when synthesising English portions of mixed English/German texts using a German-based polyglot text-to-speech (TTS) synthesis system. The polyglot system is based on a monolingual German TTS system, which uses a phone mapping from English to German to synthesise English texts. Two systems with varying degrees of assimilation to Engli...
متن کاملAn XML-based Tool for Tracking English Inclusions in German Text
The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring c...
متن کاملUnsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% prec...
متن کاملUnsupervised Disambiguation for a Multilingual Medical Information System using UMLS
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relationships between terms given...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005